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Abstract. Core-collapse supernovae are among Nature's most energetic events. They mark 
the end of massive star evolution and pollute the interstellar medium with the life-enabling ashes 
of thermonuclear burning. Despite their importance for the evolution of galaxies and life in the 
universe, the details of the core-collapse supernova explosion mechanism remain in the dark 
and pose a daunting computational challenge. We outline the multi-dimensional, multi-scale, 
and multi-physics nature of the core-collapse supernova problem and discuss computational 
strategies and requirements for its solution. Specifically, we highlight the axisymmetric (2D) 
radiation-MHD code VULCAN/2D and present results obtained from the first full-2D angle- 
dependent neutrino radiation-hydrodynamics simulations of the post-core-bounce supernova 
evolution. We then go on to discuss the new code Zelmani which is based on the open-source 
HPC Cactus framework and provides a scalable AMR approach for 3D fully general-relativistic 
modeling of stellar collapse, core-collapse supernovae and black hole formation on current and 
future massively-parallel HPC systems. We show Zelmani 's scaling properties to more than 
16,000 compute cores and discuss first 3D general-relativistic core-collapse results. 



1. Introduction 

Core-collapse supernova (CCSN) explosions are powered by the release of gravitational energy 
in the collapse of a massive star's core to a protoneutron star (PNS). While this general CCSN 
picture may be clear, its details have evaded understanding despite many decades of concerted 
theoretical and numerical effort. 

When thermonuclear core burning ends, the core of a massive star (i.e., M > 8 — 10 solar 
masses [-M©]) is comprised of iron-group nuclei (or O/Ne nuclei in the lowest-mass massive 
stars) and supported against gravity's pull primarily by the degeneracy pressure of relativistic 
electrons. Shell burning adds mass to this core and eventually pushes it over its maximum 
supportable mass. Gravitational collapse results and is accelerated by the capture of electrons 
on free and bound protons and by the photodissociation of heavy nuclei into alphas and 
nucleons. Core collapse continues until the central, subsonically-collapsing region (the inner 



core of ~ 0.5 M Q ) reaches nuclear density. There, the strong nuclear force kicks in, stiffening 
the nuclear equation of state (EOS) and halting collapse, resulting in the rebound of the inner 
core into the still collapsing outer core. A hydrodynamic shock is formed in this core bounce 
and initially propagates outward in mass and radius while losing energy to the dissociation of 
heavy infalling nuclei and neutrinos that stream away from the postshock region. The shock 
stalls within ~ 10 — 20 ms after bounce and at a radius of ~ 100 — 200 km. For a successful 
CCSN explosion, it must be re-energized, otherwise continued accretion will push the PNS over 
its mass limit. This results in a second phase of gravitational collapse, leading to the formation 
of a black hole, turning the stellar collapse event into a collapsai0 [3] • 

Finding and understanding the mechanism of shock revival has been the key problem of core- 
collapse supernova theory in the past ~ 45+ years. Current theory and modeling suggests 
(see, e.g., [1, 0] and references therein) that there are (at least) three ways to blow up massive 
stars: (1) the neutrino mechanism, relying primarily on an imbalance between charged-current 
neutrino heating and cooling in the immediate postshock regions in combination with convection 
and the standing-accretion-shock instability (SASI) (e.g., Q, S|), (2) the magnetorotational 
mechanism, based on bipolar jets created by strong magnetic stresses in rapidly rotating cores 
(e.g., [1, E3) EH] ) and (3) the acoustic mechanism recently proposed by Burrows et al. 12, 13, 14] 



which rests on the excitation of non-radial pulsations in the PNS by accretion and turbulence and 
their damping by strong sound waves that steepen into shocks, depositing energy very efficiently 
in the postshock region. The acoustic mechanism has not yet been confirmed by other groups 
and perturbative work suggests that the pulsation amplitudes may be limited by a parametric 
instability that transfers energy in faster damping daughter modes [TBI ], but this mechanism 
remains a compelling possiblity. 

In the three potential explosion mechanisms, the breaking of spherical symmetry is either 
fundamentally necessary or a key facilitating factor, making multi-D modeling a necessity. 
Furthermore, a CCSN simulation must not only spatially resolve the steep gradients at the 
PNS surface and the small to l arg e scale turbulent eddies of overturn (typical Ax ~ O(100 m) 
or less for magnetoturbulence (lU [l(J), but the simulation domain must also extend to many 
thousand kilometers to encompass sufficient material to track a long-term postbounce accretion 
phase without boundary effects affecting the evolution and to allow a potential explosion to 
fully develop. Hence, multi-scale modeling is required and may be implemented via fixed and/or 
adaptive mesh refinement (FMR/AMR) or particle methods (smoothed particle hydrodynamics 
[SPH], e.g. [13]). Furthermore, for a complete model of stellar collapse and the CCSN postbounce 
phase, a broad spectrum of tightly-coupled physics must be included - ideally accurately, but at 
least approximately. This includes, but is not necessarily limited to, (magneto)hydrodynamics 
(MHD), general relativity (GR), nuclear physics (nuclear EOS and nuclear reactions), neutrino 
radiation-transport, and neutrino-matter interaction microphysics. 

The multi-D, multi-scale, and multi-physics nature of the CCSN problem makes it complex 
and difficult to model and solve computationally. At the same time, however, owing to its 
complexity, physically accurate computational modeling, in combination with future detailed 
observations of CCSN neutrinos and gravitational waveqj, is our only chance of solving it. 

In this contribution to the Scientific Discovery through Advanced Computing (SciDAC) 
Conference 2009, we discuss central aspects of our broad computational approach to stellar 
collapse and the CCSN problem and highlight recently obtained results. In section [2] 

1 By collapsar we mean a stellar collapse event that does not lead to a supernova explosion. If the progenitor star 
has the needed angular momentum distribution, a Collapsar-type Gamma-Ray Burst may occur in a collapsar, 
see e.g., Q. 

2 Gravitational waves are lowest order quadrupole waves. Hence, they are an intrinsically multi-D phenomenon 
and do not exist in spherical symmetry. See [18( for a thorough introduction to gravitational wave theory. 
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Figure 1: Hammer-type (smoothed) map projections of the normalized specific intensity I v {"d, ip)/ J v 
(where </„ is the mean intensity) in model s20.nr at 160 ms after bounce. The color map is logarithmic 
and each individual projection is set up to range from max(/„/J„) (red) to 10 -4 max(/„/ J„) (black). 
Shown is the specific intensity of v e , v ei and "i^" neutrinos at e v = 16.3 MeV on the equator and at 
radii of 30, 60, 120, and 240 km. The Hammer projection is set up in such a way that # varies in the 
vertical from 0° (top) to 180° (bottom) and ip varies horizontally from —180° (left) to +180° (right). 
Grid lines are drawn in ■&- and cp-intervals of 30° . Note (a) that the neutrino radiation helds are isotropic 
at R = 30 km, (b) the increasing forward-peaking of I u with increasing radius (and decreasing optical 
depth), and (c) that at any given radius I u of "z^" is more forward-peaked than that of the v e component, 
which, in turn, is always more forward-peaked than the v e component. This fact is a consequence of a 
transport mean-free path that varies with species (and energy; not shown here) and is smallest for the 
electron neutrinos. 



we introduce the CCSN code VULCAN/2D and present results from the first long-term full- 
2D momentum-space angle-dependent radiation-hydrodynamics simulations of the postbounce 
phase in CCSNe. We go on to describe in section [3] our full-GR 3D stellar collapse simulation 
package Zelmani which is based on the Cactus computational framework and designed 
for massively-parallel execution. Zelmani has already been applied to simulations of rapidly- 
rotating 3D core collapse for which we present results. In Section we wrap up and present a 
forward-looking summary. 



2. Angle-Dependent Neutrino Radiation-Hydrodynamic VULCAN/2D Simulations 

VULCAN/2D is a general Newtonian axisymmetric (2D) radiation-magnetohydrodynamics code 
described in 13, 2^, 21, 2^ and extended and applie d to the stellar collapse and CCSN problems 
in a large number of studies (e.g., [H, [H, SH, Hi, IU, [H, [H, HE S3] ) • VULCAN/2D implements the 
arbitrary Lagrangian-Eulerian (ALE) technique with second-order TVD remap. The scheme is 
directionally unsplit and allows for arbitrary grids. Here we use VULCAN/2D in hydrodynamic 
mode which implements a finite-difference representation of the Newtonian Euler equations with 
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artificial viscosity. VULCAN/2D allows for the use of general EOS tables and for the present study 
we employ the finite-temperature nuclear Shen EOS 28j which is based on a relativistic mean- 
field model for nuclear interactions and transitions to an ideal gas of nuclei, nucleons, photons, 
and electrons at low densities. 

VULCAN/2D implements neutrino transport in two different multi-group (and multi-species) 
ways. The module implementing time-implicit multi-group flux-limited diffusion (MGFLD) was 
described in [lit ] and evolves the angle-averaged mean radiation intensity J u , the zeroth angular 
moment of the specific intensity I u , and uses the flux limiter of [2^]. The angle-dependent 
transport module implements the method of discrete ordinates (S n ) [H, H3] in time-implicit 
fashion at low optical depths and matches to MGFLD at optical depth r > 2 for accelerated 
conversion at high optical depths where the radiation field is isotropic |2l| . Both transport solvers 
employ the neutrino microphysics outlined in [3l| (including nucleon-nucleon bremsstrahlung, 
e.g., [32j), use typically 16 energy groups logarithmically spaced from 2.5 to 220 MeV, and 
consider v e and v e individually, while lumping together z/„, i?n, v T , and v T into X." In both 
MGFLD and S n we neglect neutrino energy-bin coupling (relevant in inelastic scattering) and 
assume the slow-motion approximation to radiation transport appropriate in the postbounce 
phase, neglecting velocity-dependent terms of 0(v/c). As a consequence, neutrino advection, 
Doppler shifts and aberration effects are not considered. This greatly limits the computational 
complexity of the problem, but its impact on the transport solution depends on the particular 
choice of reference frame and was examined in 33j |. Around core bounce and neutrino breakout, 
during the non-linear phase of the SASI hundreds of milliseconds after bounce, and in the case 
of very rapid rotation, including 0(v/c) terms is advisable. Full 0(v/c) Boltzmann transport 
with energy redistribution will be addressed in the future. 

In the S n solver, we discretize the angular radiation distribution evenly in cost? from -1 to 1 
and make the number of y?-bins (running from to ir, because of axial symmetry) a function 
of cos ?9 to tile the hemisphere more or less uniformly in solid angle. In our time-dependent S n 
runs, we employ 8 cosi? bins, resulting in a total of 40 angular zones. Steady-state radiation 
fields are computed either with 8 cosi? bins, 12 cos?? bins (92 total angular zones) or 16 cost? 
bins (162 total angular zones) at each spatial grid point. 

The computational costs for a VULCAN/2D simulation can be estimated based on the single-zone 
update cost for one neutrino group/species, the number of required updates, and the number of 
zones, neutrino energy groups and species. For a typical MGFLD simulation, N zoncs = 40000, 
-^energy groups = 16, N spec i es = 3 and the single-zone update cost including hydro, gravity, and 
one group of MGFLD is ~ 125 flop. Thus, a single timestep requires ~ 240Gflop and a typical 
simulation lasts for ~ 1 million timesteps, leaving us with a total flop count of ~ 250Pflop. In a 
S n simulation, the same single-zone update flop count applies, but the number of zones is scaled 
by a factor equaling the number of momentum-space angular zones modified by an empirical 
correction factor of ~ 0.1 obtained through timing measurements. Thus, a S$ simulation is 
about 4 times more expensive than a MGFLD run and requires ~ 1000 Pflop to complete. 

In order to complete a simulation in reasonable time, we parallelize VULCAN/2D transparently 
and efficiently via MPI in neutrino groups (energy/species). Hence, for 16 x 3 groups we 
obtain a speed-up of almost 48 compared with a single-core calculation, since only scalars are 
communicated at the end of each timestep. Node-local OpenMP parallelism is an additional 
way for increasing performance and is currently under consideration. Domain decomposition, 
however, is not a viable option, since the communication overhead due primarily to the relatively 
small number of zones in a 2D simulation would quickly dominate over any performance gains. 

Assuming a sustained performance of lGflop/core, a MGFLD calculation is completed in 
~ 60 days, while the S n calculations presented here have required 90 — 120 days on 48 cores, but 
were run for only a limited amount of physical postbounce time. 
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2.1. Simulation Setup 

We consider here two models based on the 20-Mq supernova progenitor of (34[. Model s20.nr is 
mapped onto the VULCAN/2D grid without rotation while we impose a precollapse central angular 
velocity of 7rrads _1 in model s20.7r and decrease it slowly with distance from the rotation axis 
according to the simple rotation law specified in [13] . This results in an essentially rigidly- 
rotating PNS core with a period of ~ 2 ms and strong rotational deformation of the entire 
PNS and the postshock region. We choose a VULCAN/2D grid setup with a pseudo-Cartesian 
central region that smoothly transitions to a spherical grid at 20 km. This not only removes 
the coordinate singularity at the origin of spherical grids, but also liberates the PNS core and 
allows for larger hydrodynamic timesteps 12, In the angular direction we employ 120 zones 



and there are 30 logarithmically-spaced radial zones interior to the grid transition and 200 (also 
logarithmically-spaced) radial zones from 20 km to 4000 km. 

Both s20.nr and s20.7r are run with MGFLD to 160 ms after core bounce. At this point, we 
solve for stationary-state angle-dependent S n radiation fields, then evolve the simulations with 
S$ (see [2l[ for resolution tests) over postbounce intervals of 340 ms and 390 ms for model s20.nr 
and s20.7r, respectively. For comparison we also continue the MGFLD variants. 

2.2. Results 

The CCSN problem is particularly challenging in its radiation transport aspects, because 
the energy- and species-dependent neutrino radiation fields transition from being completely 
isotropic inside the PNS (where the diffusion limit applies) to being completely forward-peaked 
at t < 1 (the free-streaming limit). The transition between diffusion and free-streaming is 
handled in an approximate way via the flux limiter in MGFLD (e.g., [1^]), but only angle- 
dependent transport can self-consistently and accurately capture the gradual change of the 
radiation field whose degree of forward-peakedness in the postshock heating region has an 
influence on the neutrino heating efficiency [2ll . HHj. In Fig. [IJ we present map projections of 
the angular distribution of the specific neutrino radiation intensity l v as seen by an equatorial 
observer located at various radii. Shown are the I u of v e , i> e , and X" neutrinos at a neutrino 
energy of e u = 16.3 MeV. Inside the PNS, at R = 30km, all radiation fields are isotropic and 
become increasingly forward peaked with radius, corresponding to decreasing optical depth. 
At any given radius, "i/„"s are more forward-peaked than P e s which, in turn, are always more 
forward-peaked than v e s. This hierarchy is characteristic for the postbounce phase of CCSNe and 
is a result of the species-dependent transport mean- free path systematics in CCSN matter (e.g., 
0). 

In the left panel of Fig. [2] we plot the specific net gain, defined as neutrino heating minus 
neutrino cooling Q~ and contrast the S n results for the nonrotating model s20.nr (left half) with 
the rapidly rotating model s20.7r (right half) at 160 ms after bounce. In addition, we superpose 
velocity vectors to visualize the flow of matter. The qualitative and quantitative differences 
between the two models are large. In model s20.nr, there is strong neutrino heating in the 
immediate postshock region which drives strong convection. One also notes a slight deformation 
of the shock away from spherical symmetry which is indicative of the onset of the SASI. In 
model s20.7r, on the other hand, the neutrino heating occurs very asymmetrically and primarily 
in polar regions where the neutrino flux is highest (see also the discussion in [2l|, 1371 . l38f|). The 



shock is more extended along the poles, but there is a pronounced equatorial bulge with very 
little net heating. Convective overturn is confined to polar regions due to (1) a large positive 
specific angular momentum gra dient at low latitudes, and (2) the absence of strong neutrino 



heating in these regions [371 . |39j, |40J . In addition, no signs of the SASI are apparent and the 



subsequent evolution of model s20.7r suggests that rapid rotation delays and modifies the SASI 



in axisymmetry. The situation may be different in 3D (e.g., [4l|, |42|) 
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Figure 2: Left: Comparison of the net gain (neutrino heating minus cooling) in the nonrotating model 
s20.nr and in the rapidly-rotating model s20.7r as obtained with 2D angle-dependent transport at 160 ms 
after bounce. Due to its rapid rotation, model s20.7r has a larger shock radius, enhanced heating near the 
axis, but due to a positive specific angular momentum gradient in the postshock region at low latitudes, 
also much less convective overturn. Right: Comparison of the v e fluxes at e v = 9.7 McV in model 
s20.7r at 160ms after bounce and as obtained with MGFLD (left section) and S n (right section). The 
radiation fields are oblate in the PNS core and deform to a prolate shape further out. Note that S n 
predicts a prolateness of the radiation field to much greater radii than MGFLD. The latter leads to 
nearly spherically symmetric radiation fields at radii greater than > 150-200 km. This result is largely 
independent of neutrino species and e v . 



The right panel of Fig. [2] visualizes the spectral flux density of u e at e u = 9.7 MeV and at 
160 ms after bounce in the rapidly spinning model s20.7r and contrasts the MGFLD result 
(left half) with that obtained with S n (right half). In both variants, the radiation field is 
oblate inside the PNS core, but quickly transitions to a prolate shape further out. The density 
gradient in the polar regions of the core is much steeper, allowing for much smaller neutrino 
sphere radii {R „ = R{r ~ 2/3)) and resulting in a dramatic enhancement of the polar neutrino 
flux [HI, 37, El]. The MGFLD result captures the overall systematics, but due to the diffusive 
nature of the 2D MGFLD approach, the radiation field asymmetry is smoothed out at low r and 
becomes nearly spherical at radii > 150 km. The angle-dependent variant, on the other hand, 
captures the true radiation-field asymmetry even at large radii. In the case of model s20.7r, 
this results in polar neutrino heating that is locally larger by up to a factor of two than in the 
MGFLD calculation. The radiation field asymmetry in the nonrotating model s20.nr is much 
smaller and the radiation fields predicted by MGFLD and S n are more similar, while the S n 
calculation still yields locally ~ 10% greater heating rates. 

Figure [3] visualizes the dynamical effect of angle-dependent neutrino transport on the postshock 
evolution of CCSNe by comparing the time evolution of the angle-averaged shock radii of the 
MGFLD and S n variants of models s20.nr and s20.7r. In the latter, switching to S n leads to a 
shock expansion primarily in the polar regions where the increase in neutrino heating is greatest. 
However, increased neutrino cooling from the now also extended cooling region counteracts the 
shock expansion and lets it settle at a new, somewhat higher average radius. The larger radius 
and the increased local heating lead to an earlier onset of the rotationally-modined SASI in 
the S n variant. This is reflected in the earlier and more pronounced oscillations of the average 
shock radius. However, towards the end of the postbounce time covered by the simulations, 
the MGFLD variant's average shock radius catches up and there is no large overall difference 
between the dynamics seen in the S n and MGFLD calculations of model s20.7r. 

The dynamical evolutions in the S n and MGFLD variants of the nonrotating model s20.nr are 
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Figure 3: Average shock radius as a function of time after core bounce in MGFLD and S n variants of 
models s20.nr and s20.7r. The large oscillations seen in the curves of the nonrotating model s20.nr are 
due to strong SASI oscillations and are somewhat more pronounced in the S n variant. The average shock 
radius of the rapidly rotating s20.7r settles at larger values than in the nonrotating case, but the growth 
of the SASI is slowed down by rapid rotation and only the S n model with its enhanced polar heating 
exhibits long-period SASI oscillations at intermediate postbounce times. See text and [2l| for discussion. 



even more similar and the SASI shock excursions in the two variants remain practically in phase 
for almost 200 ms. The S n calculation exhibits larger SASI excursions at later times, but, as in 
the case of model s20.7r, there is no qualitative change between the MGFLD and S n postbounce 
evolutions despite stronger neutrino heating (by up to 30% of the total heating rate at late 
times) in the latter. 

2.3. Discussion 

Neutrinos carry away ~ 99% of the gravitational energy of a neutron star formed in stellar 
collapse. Their transport and their interactions with matter are a central aspect of any CCSN 
model. Here, we have presented and summarized results from our recent VULCAN/2D simulations 



2ll ] that for the first time addressed for a multi-D simulation the long-term dependence of the 
postbounce dynamics of CCSNe on the neutrino transport technique employed. Comparing 
MGFLD and angle-dependent S n transport, we find that the former has difficulties in capturing 
physical radiation field asymmetries and preserving them at low optical depth. The S n approach 
self-consistently evolves the radiation field from the diffusion limit and isotropy to the free- 
streaming limit and forward-peakedness. S n generally leads to locally stronger neutrino heating, 
but the feedback in the supernova engine is sufficiently strong to lead to an adjustment of the 
system to a new equilibrium configuration without a true qualitative change when compared with 
MGFLD. Thus, it appears unlikely that differences in neutrino transport and/or interactions 
that result in by 10% — 30% increased heating (as seen in our simulations) can have a strong 
impact on the CCSN dynamics. Larger effects seem necessary to turn the CCSN dud into an 
explosion. 

The work of the Garching group @, 53 and of the ORNL/FAU group 0] suggest that 
the inclusion of a general-relativistic monopole term in the otherwise Newtonian potential in 
combination with a softH EOS can increase the neutrino heating efficiency due to hardened 

3 These groups use the K = 180 MeV variant of the Lattimer-Swesty EOS which is too soft to support NSs with 
gravitational masses above ~ 1.7 Mq. 
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neutrino spectra and may lead to explosion. Furthermore, [6( have shown in a simplified model 
with parametrized neutrino heating and cooling that explosions are more easily obtained in 2D 
than in ID. This result may very well extend to 3D and supports and strengthens the CCSN 
community's motivation to move towards 3D simulations. 



3. A New, Fully General-Relativistic Approach in 3D 

Today's technically most advanced and physically most complete stellar collapse and long- 
term postbounce CCSN simulations are carried out in axisymmetry (2D), use angle-dependent 
neutrino transport or MGFLD either in full 2D [ll|, 03, [U, [44| or in a ray-by-ray 
approximation [5|, 0, [H, 45] and implement Newtonian (magneto) hydrodynamics and gravity 



either in purely Newtonian fashion [ill . Il2l. J2l|. 1441 or with a GR monopole term in an 
otherwise Newtonian gravitational potential [aTM. 143. [45| . The current state-of-the-art for 
axisymmetric GR core-collapse calculations, which traditionally have focussed on estimates of 
the GW signal from rotating core collapse and bounce, is set by [46|] who performed conformally- 
flalQ calculations of the collapse and early postbounce phase with a finite-temperature (T) 
microphysical EOS and a simple deleptonization scheme [5l|] for the collapse phase in lieu of 
neutrino transport. 

Current published 3D simulations do not yet rival their 2D counterparts in accuracy and 



physical completeness. Iwakami et al. [42 . l52j ] investigated the SASI in 3D with steady-state 



initial conditions, spherical Newtonian gravity, a finite-T microphysical EOS, parametrized 
neutrino heating/cooling, a cut-out core, and a fixed high accretion rate at the outer boundary. 
The Basel group [H3], focussing on the GW signal of stellar collapse, performed 3D calculations 
of the collapse phase with a finite-T microphysical EOS and the deleptonization scheme of [HH 
during collapse, but neglected neutrino transport and heating/cooling in the postbounce phase. 

More physically accurate simulations with better neutrino physics and transport in the 
postbounce phase are required to address the explosion mechanism. Several groups are in the 
process of implementing codes with the necessary features (e.g., [51 145|. [H3] ) . 

In the following, we present our approach to 3D CCSN modeling which builds upon 
the tremendous recent progress in numerical relativity (see, e.g., (55[) and implements GR 
hydrodynamics and full GR curvature evolution in a variant of the Arnowitt-Deser-Misner 
(ADM) [5f| 3 + 1 formalism. This allows us to not only more accurately follow the CCSN 
hydrodynamics, but provides for the capability to form black holes dynamically and in 3D in 
failing core-collapse supernovae - something that is impossible in Newtonian or pseudo-GR 
formulations. Our approach makes heavy use of the Cactus computational toolkit [l^] and is 
optimized for execution on supercomputers, implementing AMR, domain decomposition, parallel 
I/O, and hybrid MPI/OpenMP parallelism for improved scaling on massively-parallel systems. 

In Section \3.1\ we introduce our GR curvature and hydrodynamics formulation and discuss the 
computational infrastructure and the various physics components of our approach, and highlight 
parallel scaling results. In Section 13.21 we go on to discuss results obtained with our approach 
by [H, 4^, 5?J in the first set of GR simulations of rotating core collapse in 3D. 



3.1. Computational Approach, Application Codes and Parallel Performance 
We employ the open source software framework Cactus [l^, [58| designed for computational 
scientific and engineering problems. It has a modular structure and enables scalable parallel 
computation across different architectures, as well as collaborative code development between 
different research groups. 

4 The conformal flatness condition (CFC) is an approximation to GR in which the radiative degrees of freedom 
have been suppressed [47| . CFC is exact in spherical symmetry and is accurate to < 5% in the core-collapse 
scenario [H, Eg Q ■ 
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Cactus consists of a central part, called the flesh, that provides core routines, and of 
components, called thorns. The flesh is independent of all thorns and provides the main program, 
which parses input parameters and activates the appropriate thorns, passing control to thorns as 
required. By itself, the flesh does very little science; to do any computational task the user must 
compile in thorns and activate them at run time. Parallelism, communication, load balancing, 
memory management, and I/O are handled by a special component, the driver, which is not 
part of the flesh and which can be transparently exchanged. The flesh (and the driver) have 
complete knowledge about the state of the application, allowing inspection and introspection 
through generic APIs. 

Cactus runs on all current mainstream architectures. Applications, developed on standard 
workstations or laptops, can be seamlessly run on clusters or supercomputers. Cactus provides 



easy access to many cutting-edge software technologies bein 
community, including the Globus Metacomputing Toolkit 



the PETSc scientific library [62], AMR, multi-block methods 
advanced visualization tools (e.g. Visit [651] ). 




oped in the academic research 
HDF5 (EH parallel file I/O, 
6311 . web interfaces \6M, and 



The Einstein Toolkit [66j] is a set of Cactus thorns providing infrastructure and basic 
functionality for GR applications codes using the variables of the ADM formalism [56]]. Within 
a simulation, the ADM variables are employed for coupling GR curvature evolution with matter 
and radiation variables and also serve in run-time anal ysis of the simulation results, such as e.g. 
evaluating constraints, locating apparent horizons 67. |68| or event horizons [6^], or calculating 
gravitational wave signals. 



3.1.1. GR Curvature Evolution. The Einstein equations are necessary for the correct 
description of gravity in the strong-field regime. They are a set of ten coupled, non-linear 
wave- type partial differential equations. We solve these equations using the BSSN formulation 
(e.g., [70[] and references therein). This formulation, similar to ADM, breaks up the four- 
dimensional spacetime into 3 + 1 dimensions, three spatial and one time dimensions. This leads 
to 25 hyperbolic time evolution equations coupled to 9 elliptic constraint equations. 

Of the 25 evolution equations, 8 are not specified by the Einstein equations, but instead have 
to be chosen as gauge conditions to determine the time evolution of the curvilinear coordinates 
of s pac etime. We choose the so-called 1 + log slicing condition and the T-driver shift condition 



57j, [7l[, which are standard gauge conditions used BSSN. They ensure stable, long-term time 



evolutions. The 9 constraint equations have to be satisfied initially, requiring solving an elliptic 
system for setting up initial data, and remain then satisfied under time evolution up to within the 
discretization error. We monitor the constraints during time evolution and do not re-solve them. 
The resulting equations for the BSSN system and the gauge conditions can be time-evolved with 
standard discretization methods. 



We are using the Kranc package [72J, [73J] for automatic code generation for the BSSN 



formulation, gauge conditions, and constraint equations. Kranc is a Mathematica package 
which generates Cactus thorns from equations. Starting from equations in Mathematica format 
which specify a system of PDEs in abstract index notation, Kranc discretizes the equations and 
generates a complete Cactus thorn that evaluates these equations. Kranc-generated thorns use 
all relevant Cactus APIs for initial data setup, analysis, time integration, and AMR. 

Automatic code generation greatly reduces the time and effort necessary to implement the 
BSSN equations, since these contain about 5,000 individual terms. Kranc also allows us to 
experiment with modifications to the formulation, e.g. to increase accuracy near singularities, 
and with modifications to low-level implementation details (loop blocking, vectorization) to 
achieve higher efficiencies on modern computer architectures. 

We have implemented the BSSN equations in an open-source Cactus thorn arrangement called 
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McLachlan [H, [H in the XiRel [H, E3| project. McLachlan is a full-featured BSSN solver able 
to simulate not only stellar collapse, but also relativistic binary systems of black holes or neutron 
stars \7l 



3.1.2. GR Hydrodynamics, Microphysics and Neutrinos On the GR hydrodynamics (GRHD) 



side, we implement the Valencia formalism of GRHD [78], |79j] which is based on a dimensionally- 
split flux-conservative high-resolution shock-capturing finite-volume approach. These methods 
are generally of lower order than those used for the spacetime curvature evolution, i.e., 
second-order in space and second-order in time. The main reason for this is not only that 
the discretization of the GR hydrodynamics problem becomes much more complicated with 
increasing order, but also that all higher-order methods must drop back to first order near 
discontinuities in the flow in order to prevent spurious oscillations in the solution. GR 
magnetohydrodynamics (GRMHD) is an extension to GRHD and can be straightforwardly 



implemented within the same general formalism [8CJ, [81 ] . 



The Zelmani CCSN package proper is a collection of Cactus thorns implementing; GRHD, a 



finite-temperature nuclear EOS, neutrino leakage, and heating [4gJ, |49j, [57j, [82|]- We generally 
use the name Zelmani synonymously to refer to the entire set of codes involved in our 3D GR 
CCSNe simulations, including McLachlan, Carpet, CactusEinstein, and Cactus. The GRHD 



module is based on a modified version of the open-source Whisky code [83l . |84( that allows 
for general, finite-temperature EOS and neutrino-matter interactions. In our simulations, we 
employ PPM reconstruction of variables at cell interfaces and the approximate HLLE solver 85j 
for the relativistic Riemann problem. 

Fully relativistic neutrino radiation transport is a formidable problem. Multi-D GR 
formulations exist (e.g., (86l . l87|). but are still awaiting their first implementations. While 
we are actively exploring various ways to implement GR transport, we presently resort to 
the deleptonization scheme of [HH for collapse and employ neutrino leakage (e.g., (88l |) and 
parametrized heating in the postbounce phase 0, Neutrino pressure contributions are 

included via the approximation discussed in [5l| . 

3.1.3. Adaptive Mesh Refinement and Parallel Scaling. Carpet 6^, 8^, 9(| is our AMR driver 
for the Cactus framework. Carpet acts as a driver layer for Cactus, providing adaptive mesh 
refinement, multi-patch capability, and efficient parallelization and I/O. We make both Cactus 
and Carpet publicly available as open source, and both are also used by a number of other 
numerical relativity and computational astrophysics groups. 

Carpet provides spatial discretization based on highly efficient block-structured, decomposed, 
logically Cartesian grids with hybrid MPI/OpenMP [9ll . parallelism. Carpet offers both 
AMR and multi-block capabilities, covering the domain with sets of distorted, logically 
rectangular blocks of grids. Time integration is performed via the recursive Berger-Oliger AMR 
scheme including subcycling in time. As demonstrated in the left panel of Fig. HI Carpet 
presently scales well to more than 16,000 cores with AMR on Leadership HPC systems. 

Carpet employs HDF5 j6ll ] for parallel, binary I/O, which is also used for checkpointing and 
restarting. The right panel of Fig. [J] depicts results of I/O benchmarks, comparing the achieved 
I/O bandwidth to the theoretical peak bandwidth on selected HPC systems. Carpet achieves a 
significant fraction of this already on about 1,000 cores. 

Although the speed and performance of high-end computers have increased dramatically over 
the last decade, the ease of programming such parallel computers has not progressed. To 
address these issues in computational modeling in general and in numerical relativity and CCSN 



simulations in particular, we are developing the Alpaca tools [94l . l95j |. In contrast to existing 



debuggers and profilers, these tools work at the much higher level of the physical equations and 
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Figure 4: Left: Weak scaling benchmark for spacetime curvature and GRHD evolution of a single 
neutron star (Ranger, Kraken, Queen Bee) or of a single black hole (Franklin) with nine levels of mesh 
refinement. Ideal scaling would be an horizontal line in both cases. Cactus/Carpet scales well up to more 
than 16,000 cores. Ranger and Kraken are the currently largest machines on the NSF TeraGrid, Queen 
Bee is the largest machine on LONI and Franklin is a large DoE/NERSC system. Right: Cumulative 
I/O bandwidth for writing checkpoint files in parallel. This measurement includes application overhead. 
The theoretical peak bandwidth is shown as an horizontal line. Cactus/Carpet is able to achieve a 
significant fraction of this. 



their discretizations and not at the level of individual lines of code or variables. The Alpaca tools 
are not external to the application, but are built-in, so that they have direct high-level access to 
information about the running application, and can interact with the user on a correspondingly 
high level. 

3.2. Computational Costs, Simulations and First Results 

A typical simulation setup to track collapse and postbounce CCSN evolution with GR curvature 
evolution and GRHD uses a 9-level AMR grid hierarchy with ~ 400 3 computational zones 
each, providing high resolution near the center (8x < 300 m), while encompassing the inner 
~ 5000 km of the dying star. There are about 400 3D grid functions required for curvature 
and GRHD which translates to a memory footprint of ~ 2 TB (including inter-process buffers 
assuming 1024 processes). A single point update requires ~ 50kflop and ~ 1 million fine grid 
updates are required, resulting in a total cost of ~ 1500 Pflop for a single simulation. On 1024 
compute cores and assuming a sustained performance of IGnop/s per core, such a simulation 
requires ~ 17 days to complete. If radiation transport, even in approximate ray-by-ray fashion, 
is included, the memory footprint and the total number of required flops must be scaled by a 
factor of 10 - 100. 

As a first application of our Cactus-based 3D GR Zelmani CCSN code, we are considering 
the collapse and very early postbounce phase of rapidly rotating iron cores with an emphasis on 
the study of rotational multi-D dynamics leading to the emission of gravitational waves (GWs). 
The latter may, in combination with neutrinos, play an important future role as diagnostic tools 
for the CCSN mechanism 0, [9f| and in the case of rotating collapse can provide information on 
the nuclear EOS, as well as on the rotation rate of the inner core at bounce [4rj |. 

We have carried out the first parameter study of rotating core collapse in 3D GR, investigating 
the dependence of dynamics and GW signal on progenitor stellar structure and precollapse 
rotational setup, specified by the degree of differential rotation and the initial central angular 
velocity. Since these simulations were run with an earlier version of Zelmani, postbounce 
deleptonization was neglected. The results of this study were extensively discussed in il, 4^, 57] 
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Figure 5: Left: Gravitational wave (GW) signals of the + polarization (top, h+ D, where D is the source 
distance) and of the x polarization (bottom, h x D), emitted by model E20 as a function of postbounce 
time and as seen by polar (red lines) and equatorial (black lines) observers. Collapse and bounce are 
axisymmetric (only h+ and emission only away from the symmetry axis) while at postbounce times 
nonaxisymmctric dynamics and GW emission develop. Right: Density colormap showing the inner 
150x150 km of the equatorial plane (z = 0) at 71 ms after bounce in the 3D GR model E20A. Velocity 
vectors are superposed and mark spiral density waves that develop due to a nonaxisymmctric instability 
in the PNS and propagate through the postshock region. 



and we highlight a number of the findings in the following. 

In the left panel of Fig. [5] we plot the two GW polarizations h + and h x , both scaled by 
distance D to the source and in units of centimeters, as extracted from the multi-D dynamics 
in our model E20A that is based on the rotating 20- M progenitor star of Its precollapse 
central angular velocity is ~ 3.1rads _1 and its precollapse rotation rate = T/\W\ is ~ 0.004. 
The corresponding values early after bounce are ~ 4radms _1 and ~ 0.09. The simulation is 
run in 3D from the onset of collapse to ~ 70 ms after bounce. 

We find that model E20A, like all other considered models, stays axisymmetric through 
collapse, core bounce, and the early postbounce phase. At bounce, the large deceleration of the 
inner core's infall leads to a large negative spike in the waveform clearly visible in h + D as seen 
by an equatorial observer and shown in the left panel of Fig. Due to the axisymmetry of the 
rotating collapse dynamics, GWs are emitted only in one polarization (linear polarization; here, 
due to the choice of source orientation, in the + polarization) and only away from the symmetry 
axis of the system. After bounce and on a timescale of > 20 — 40 ms nonaxisymmetric dynamics 
develops in the PNS and is likely due to a corotation-type instability in which an azimuthal 
mode picks up power from the axisymmetric background rotation at the point where its mode 
pattern speed is in corotation with the fluid (see, e.g., [llj). The quadrupole components 
{I = 2, m = 2) of the nonaxisymmetric dynamics are reflected in the late-time GW signal as 
a quasi-periodic, elliptically polarized signal, strongly correlated in /i + and h x and emitted at 
twice the m = 2 pattern speed. This nonaxisymmetric GW emission is smaller in amplitude 
than the burst associated with core bounce, but, due to its longer duration, the total energy 
emission is greater and the emission's narrowband nature favors detection by GW observatories 
such as LIGO [H, 

Corotation instabilities are well known from studies of astrophysical disks and belong to a class 
of dynamical shear instabilities that draw from the shear energy stored in differential rotation 
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and transport angular momentum outward in spiral waves. Differential rotation is abundant in 
the postshock region 40j, |57| and the spiral waves are apparent in the right panel of Fig. [5] that 
depicts the density distribution with superposed fluid velocity vectors in the equatorial plane of 
model E20A at ~ 70 ms after bounce. 



4. Summary and Outlook 

The challenging complexity and non-linearity of the CCSN problem and current technical and 
computational limitations call for a broad computational program with multiple modeling 
approaches. In this contribution to SciDAC 2009, we introduced two CCSN simulation 
programs and discussed their recent results. The first, VULCAN/2D, a full CCSN code with 
an implementation of the angle-dependent and MGFLD radiation-MHD equations, is capable 
of studying all presently discussed CCSN mechanisms, but is limited to axisymmetry and 
Newtonian gravity and dynamics. Being based on proven legacy technology and despite 
not employing the massively-parallel domain-decomposition paradigm, VULCAN/2D runs very 
efficiently on a modest number of compute cores and yields a high science output per flop (e.g., 
[H E2, EE EI S3, SI HI, IH, Ezl ) . This includes the first long-term true multi-D angle-dependent 
neutrino transport CCSN calculations highlighted in this article. The result of this study is an 
example of Mazurek's /awjf|. Applied to the present situation, it states that in the tightly- 
coupled CCSN phenomenon, even a rather significant (> 10% — 30%) change of the postbounce 
conditions, in this case, of the neutrino heating rate, is absorbed by the strong feedback between 
radiation, hydrodynamics, EOS, and gravity and no qualitative change results. 

The frontier of CCSN modeling is clearly 3D and the hope is that the additional degree of 
freedom and the more accurate representation of (turbulent) convection and SASI help produce 
successful and powerful explosions whose asymptotic energies match observations. We approach 
the computationally and technically challenging step to 3D with the code Zelmani that is based 
on the open-source Cactus framework and uses scalable AMR via the Carpet driver module 
that implements state-of-the-art HPC paradigms, including hybrid MPI/OpenMP parallelism 
for modern massively-parallel multi-core architectures. Zelmani is set apart from other 3D 
codes (e.g., 0, 0, [HE) by its full GR nature, evolving Einstein's equations from one spatial 
3-hypersurface to the next and treating the dynamics fully relativistically instead of re-solving 
a Newtonian or pseudo-relativistic Poisson equation at every timestep coupled to Newtonian 
hydrodynamics. Dynamical spacetime evolution not only enables us to study the impact of GR 
on the CCSN evolution, but also allows us to investigate the dynamical formation of black holes 
in failing core-collapse supernovae. Such collapsars are considered as likely candidates for the 
central engines of gamma-ray bursts (GRBs; e.g., El)) but their dynamical formation has not 
been modeled and remains to be understood. 

While still under development towards a full GR radiation-MHD CCSN code, Zelmani has 
already been applied to 3D GR hydrodynamic studies of rotating stellar collapse, leading to 
an improved understanding of the gravitational wave signature of CCSNe [i^ . 49 . 57|. In its 
completed form, Zelmani will allow us to take full advantage of petaflop supercomputers to 
comprehensively address the CCSN problem and the CCSN-GRB relationship. 
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